Some practical preliminary steps are useful for performing longitudinal data analyses. Some preliminary steps might even already show whether the data will support your hypothesized growth function or to fit a different growth function. Here, I go through some of the practical things to do before performing longitudinal data analyses (using wide or long dataset).
It is important to consider whether the measured variables meet different measurement assumptions, including
Reliability of the measurement instrument over repeated observations since a good indication of reliability over repeated observations suggests longitudinal reliability of the instrument. Cronbach’s \(α\) at each time point can be calculated to show reliability of the measurement instrument. This, however, does not mean that there is reliable change within individuals over time.
Related to reliability of the measurement instrument is ensuring that the changes observed are true changes in individuals and not changes in the measurement instrument or changes in the meaning of the attribute under study over repeated observation. Measurement invariance testing can be used useful here, but could also be difficult when repeated observations spans over several years (ages) as the meaning of attributes/constructs may differ for people of different ages
You can read more about things to do before fitting growth models in the reference below:
Grimm, K. J., Ram, N., & Estabrook, R. (2016). Growth modelling: Structural equation and multilevel modelling approaches. Guilford Publications.
## Read data data<-read.csv("/Volumes/anyan-1/frederickanyan.github.io/quantpost_data/data.csv")#Create new data set with only your main outcome variableslonely<-data[, c("personid", "lone1", "lone2", "lone3", "lone4", "lone5")]
First thing to notice from the descriptive statistics is the number and pattern of missing data. It can also be noticed that, the means and standard deviations show a simple pattern with increases in the feeling of loneliness from T1 through to T3, and begins to decline afterwards though to T5 coupled with increases in variation and then a decline after T3.
It can be noticed already from the means that a linear growth function might not accurately characterize the trajectory in the data.
The feasibility of estimating a growth model can also be already determined by examining the covariance matrix. If the covariances between two adjacent time points (T1 and T2; T2 and T3; T3 and T4; T4 and T5) are higher than non-adjacent time points, this could likely indicate non-negative slope variance. For example, in our covariance matrix the observed covariances between two adjacent time points are 0.15, 0.14, 0.11 and 0.08. These covariances are sometimes higher but also smaller than non-adjacent time points and thus, does not easily determine that there would be no negative slope variance.
The correlations over time provide unique information for longitudinal analysis. Here, most of the correlations show modest associations, indicating that the level of stability of individual differences across time is modest to high.
3. Supplement main analysis with bivariate scatter plots and correlations
Code
#bivariate scatter plots below the diagonal, histograms on the diagonal, and the Pearson correlation above the diagonal.pairs.panels(lonely[, 2:6], lm =TRUE)#lm = TRUE to fit a regression line
Bivariate scatter plots and correlations along with histograms can be supplemented to the main analysis.
4. Examine longitudinal plots
Code
plot_trajectories(data =lonely, id_var ="personid", var_list =c("lone1", "lone2", "lone3", "lone4", "lone5"), xlab ="Year", ylab ="Loneliness", connect_missing =FALSE, #Want to plot only complete observations random_sample_frac =0.05, #You can select more or less than 5% of the data by adjusting this title_n =TRUE)
Warning: Removed 9 rows containing missing values or values outside the scale range
(`geom_line()`).
Warning: Removed 10 rows containing missing values or values outside the scale range
(`geom_point()`).
Make longitudinal plots to show participant’s scores of loneliness indexed on the y-axis and time of observation on the x-axis. Here, you can make one longitudinal plot that visualizes the overall trajectory for all participants and one that visualizes the trajectory for a subset of the participants.
5. Examine separate individual longitudinal plots
Code
plot_trajectories(data =lonely, id_var ="personid", var_list =c("lone1", "lone2", "lone3", "lone4", "lone5"), xlab ="Year", ylab ="Loneliness", connect_missing =FALSE, #Want to plot only complete observations random_sample_frac =0.025, #You can select more or less than 5% of the data by adjusting this title_n =TRUE)+facet_wrap(~personid)
Warning: Removed 5 rows containing missing values or values outside the scale range
(`geom_line()`).
Warning: Removed 5 rows containing missing values or values outside the scale range
(`geom_point()`).
My personal choice has been to show separate individual longitudinal plots, but you can decide to show whichever one works for you.
Code
## Reshape from wide to long using tidyrlonelylong<-lonely%>%pivot_longer(cols =2:6, names_to ="loneyearly", values_to ="value")
1. Examine univariate and bivariate statistics
Code
#Examine descriptive statistics using the wide data set.describe(lonely[, 2:6])#univariate descriptives
First thing to notice from the descriptive statistics is the number and pattern of missing data. It can also be noticed that, the means and standard deviations show a simple pattern with increases in the feeling of loneliness from T1 through to T3, and begins to decline afterwards though to T5 coupled with increases in variation and then a decline after T3.
It can be noticed already from the means that a linear growth function might not accurately characterize the trajectory in the data.
The feasibility of estimating a growth model can also be already determined by examining the covariance matrix. If the covariances between two adjacent time points (T1 and T2; T2 and T3; T3 and T4; T4 and T5) are higher than non-adjacent time points, this could likely indicate non-negative slope variance. For example, in our covariance matrix the observed covariances between two adjacent time points are 0.15, 0.14, 0.11 and 0.08. These covariances are sometimes higher but also smaller than non-adjacent time points and thus, does not easily determine that there would be no negative slope variance.
The correlations over time provide unique information for longitudinal analysis. Here, most of the correlations show modest associations, indicating that the level of stability of individual differences across time is modest to high.
3. Supplement main analysis with bivariate scatter plots and correlations
Code
#bivariate scatter plots below the diagonal, histograms on the diagonal, and the Pearson correlation above the diagonal.pairs.panels(lonely[, 2:6], lm =TRUE)#lm = TRUE to fits regression line
Bivariate scatter plots and correlations along with histograms can be supplemented to the main analysis.
4. Examine longitudinal plots
Code
#Longitudinal plots with the long data setggplot(data =lonelylong[which(lonelylong$personid<26),], #Select the first 25 participants to showaes(x =loneyearly, y =value, group =personid))+geom_line()+#geom_smooth(method = lm, se = FALSE, size = 1) +xlab("Time of observation")+ylab("Loneliness")
Warning: Removed 17 rows containing missing values or values outside the scale range
(`geom_line()`).
Make longitudinal plots to show participant’s scores of loneliness indexed on the y-axis and time of observation on the x-axis. Here, you can make one longitudinal plot that visualizes the overall trajectory for all participants and one that visualizes the trajectory for a subset of the participants.
5. Include separate individual trajectories
Code
#Longitudinal plots with the long data setggplot(data =lonelylong[which(lonelylong$personid<6),], #Select only five participants aes(x =loneyearly, y =value, group =personid))+geom_line()+#geom_smooth(method = lm, se = FALSE, size = 1) +xlab("Time of observation")+ylab("Loneliness")+facet_wrap(~personid)
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).
My personal choice has been to show separate individual longitudinal plots, but you can decide to show whichever one works for you.